Efficient Term Extraction and Indexing Approach in Small-Scale Web Search of Uyghur Language
نویسندگان
چکیده
In order to avoid the frequently read-write of hard disk and to speed up the search, the index should be saving in the memory in the small-scale web search. But, to express the original information by fewer memory spaces, also needs for index compression, and this would increases the computation expenses or brings certain harm to the original information in a way. In this research of Uyghur small-scale web search, in order to speed up the retrieval and query speed, inverted index has established uses Hash table data structure and entirely stay resident in memory. In the aspect of index compression, have not uses any compression technique, but proposed a word grouping approach based on simplified N-gram statistical model ,and extracting semantic words that structurally stable, semantically complete and independent ,and greatly reduces the scale of indexing item list. Thereby, not only served the purpose of index compression, but also solved the ambiguity problem certain extent and improved the search precision obviously. The experimental result indicated that, our method is feasible and effective.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملHigh Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملIntegrating RDF Querying Capabilities into a Distributed Search Infrastructure
The Semantic Web is inherently distributed, and covers both metadata and full-text information. Semantic search therefore can profit a lot from peer-to-peer infrastructures as well as from powerful metadata search functionalities based on full-text search technologies. In this paper we focus on an approach extending an existing P2P search infrastructure with RDF querying capabilities, which bot...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملQuery-Driven Indexing in Large-Scale Distributed Systems
Efficient and effective search in large-scale data repositories requires complex indexing solutions deployed on a large number of servers. Web search engines such as Google and Yahoo! already rely upon complex systems to be able to return relevant query results and keep processing times within the comfortable sub-second limit. Nevertheless, the exponential growth of the amount of content on the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Multimedia
دوره 8 شماره
صفحات -
تاریخ انتشار 2013